Chapter 2: Sensors & Perception in Physical AI

"Without perception, there is no intelligence. A robot without sensors is like a mind without senses—capable of thought but unable to interact with reality."

Introduction to Robot Perception
Visual Sensors
Range and Depth Sensors
Motion and Orientation Sensors
Force and Tactile Sensors
Sensor Fusion
Perception Pipeline
Challenges in Real-World Perception
Practical Applications

Introduction to Robot Perception

Robot perception is the process by which robots sense and interpret their environment. Just as humans rely on their five senses to understand the world, robots use various sensors to gather information about their surroundings. This sensory data forms the foundation for all intelligent behavior—from simple obstacle avoidance to complex manipulation tasks.

Why Perception Matters

Perception enables robots to:

Navigate safely through dynamic environments
Identify objects, people, and obstacles
Manipulate objects with appropriate force and precision
Interact naturally with humans
Adapt to changing conditions in real-time

The Perception Challenge

Unlike controlled factory environments, real-world settings present numerous challenges:

🌦️ Varying lighting conditions (bright sun to darkness)
🏃 Dynamic obstacles (moving people, vehicles)
🎭 Object variations (different shapes, colors, textures)
📏 Scale differences (small screws to large furniture)
⚡ Real-time requirements (decisions in milliseconds)

Visual Sensors

Vision is arguably the most information-rich sensing modality for robots, providing detailed information about the environment's appearance, structure, and composition.

RGB Cameras

Standard color cameras capture images similar to human vision.

Specifications:

Resolution: 640x480 (VGA) to 4K (3840x2160)
Frame Rate: 30-120 fps
Field of View: 60°-180°
Output: RGB color images

Advantages:

✅ Rich color and texture information
✅ High resolution for detailed recognition
✅ Mature computer vision algorithms
✅ Relatively low cost

Limitations:

❌ No direct depth information
❌ Sensitive to lighting conditions
❌ Computationally intensive processing

Applications:

Object recognition and classification
Lane detection in autonomous vehicles
Human pose estimation
Visual servoing for manipulation

# Example: Capturing images with OpenCV
import cv2

# Initialize camera
camera = cv2.VideoCapture(0)
camera.set(cv2.CAP_PROP_FRAME_WIDTH, 1920)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 1080)

while True:
    ret, frame = camera.read()
    if not ret:
        break
    
    # Process frame
    cv2.imshow('Robot Vision', frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

camera.release()
cv2.destroyAllWindows()

Depth Cameras

Depth cameras provide distance information for each pixel, creating a 3D representation of the scene.

Stereo Cameras

Use two cameras (like human eyes) to estimate depth through triangulation.

How They Work:

Capture two images from slightly different positions
Find corresponding points in both images
Calculate depth using disparity (difference in position)

Popular Models:

Intel RealSense D400 series
ZED 2 by Stereolabs
OAK-D by Luxonis

Example Configuration:

Parameter	Typical Value
Baseline	50-100mm
Depth Range	0.3-10m
Depth Accuracy	±2% at 2m
Resolution	1280x720

Structured Light Cameras

Project a known pattern (dots or lines) onto the scene and analyze deformation to compute depth.

Advantages:

✅ Works in low light
✅ High accuracy at close range
✅ Dense depth maps

Limitations:

❌ Limited range (typically under 5m)
❌ Struggles outdoors (sunlight interference)
❌ Multiple devices can interfere

Example: Microsoft Kinect (now discontinued but influential)

Time-of-Flight (ToF) Cameras

Measure the time light takes to travel to an object and back.

Advantages:

✅ Fast depth acquisition
✅ Simple depth calculation
✅ Works with moving objects

Applications:

Gesture recognition
3D scanning
Collision avoidance

# Example: Using Intel RealSense for depth perception
import pyrealsense2 as rs
import numpy as np

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

# Start streaming
pipeline.start(config)

try:
    while True:
        # Wait for frames
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        
        # Convert to numpy arrays
        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())
        
        # Get distance at center point
        center_x, center_y = 320, 240
        distance = depth_frame.get_distance(center_x, center_y)
        print(f"Distance to center: {distance:.2f}m")

finally:
    pipeline.stop()

Thermal Cameras

Detect infrared radiation (heat) emitted by objects.

Use Cases:

Night vision for autonomous vehicles
Human detection in low visibility
Temperature monitoring in industrial settings
Search and rescue operations

Specifications:

Resolution: 320x240 to 640x512
Temperature Range: -20°C to 500°C
Thermal Sensitivity: ±2°C

Range and Depth Sensors

LiDAR (Light Detection and Ranging)

LiDAR uses laser pulses to measure distances, creating highly accurate 3D point clouds of the environment.

How LiDAR Works:

Emit laser pulse
Pulse reflects off object
Sensor detects return signal
Calculate distance: distance = (speed_of_light × time) / 2
Rotate to scan 360° (or use solid-state design)

Types of LiDAR:

Type	Scan Pattern	Use Case
Mechanical	360° rotation	Autonomous vehicles, mapping
Solid-State	Fixed FOV	Compact applications
MEMS	Mirror-based	Cost-effective solution
Flash	Entire scene at once	Fast moving robots

Specifications Comparison:

Parameter	2D LiDAR	3D LiDAR
Range	0.1-30m	0.5-200m
Angular Resolution	0.25°-1°	0.1°-0.4°
Scan Rate	5-15 Hz	10-20 Hz
Points per Second	10K-50K	300K-2M
Cost	$100-1K	$1K-75K

Popular Models:

Velodyne VLP-16: 16 channels, 100m range, autonomous vehicles
Ouster OS1: 64/128 channels, high resolution
Livox Mid-70: Non-repetitive scanning, cost-effective
SICK TiM: 2D scanning, industrial robotics

Advantages:

✅ Highly accurate (±2cm)
✅ Long range (up to 200m)
✅ Works in all lighting conditions
✅ Direct 3D information

Limitations:

❌ High cost (especially for dense 3D)
❌ Struggles with reflective/transparent surfaces
❌ Limited color information
❌ Moving parts (mechanical models)

# Example: Processing LiDAR data in ROS 2
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan

class LidarProcessor(Node):
    def __init__(self):
        super().__init__('lidar_processor')
        self.subscription = self.create_subscription(
            LaserScan,
            '/scan',
            self.scan_callback,
            10)
    
    def scan_callback(self, msg):
        # Find minimum distance (closest obstacle)
        min_distance = min(msg.ranges)
        min_index = msg.ranges.index(min_distance)
        angle = msg.angle_min + min_index * msg.angle_increment
        
        self.get_logger().info(
            f'Closest obstacle: {min_distance:.2f}m at {angle:.2f} rad')
        
        # Check if obstacle is too close
        if min_distance < 0.5:
            self.get_logger().warn('DANGER: Obstacle too close!')

def main():
    rclpy.init()
    node = LidarProcessor()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

Radar

Radio Detection and Ranging uses radio waves to detect objects and measure velocity.

Advantages over LiDAR:

✅ Works in fog, rain, dust
✅ Measures velocity directly (Doppler effect)
✅ Long range
✅ Lower cost

Limitations:

❌ Lower resolution than LiDAR
❌ Less accurate for static objects
❌ Reflections from metal surfaces

Applications:

Automotive adaptive cruise control
Collision avoidance
Speed measurement
Long-range detection

Ultrasonic Sensors

Use sound waves (40 kHz) to measure distance.

Characteristics:

Range: 2cm to 4m
Beam Width: 15°-30° (cone-shaped)
Update Rate: 10-20 Hz
Cost: Very low ($2-20)

Advantages:

✅ Very inexpensive
✅ Simple to use
✅ Reliable for close-range detection

Limitations:

❌ Short range
❌ Affected by soft/angled surfaces
❌ Slow update rate
❌ Wide beam (poor directionality)

Common Uses:

Parking sensors
Obstacle detection for mobile robots
Liquid level measurement
Proximity detection

Motion and Orientation Sensors

Inertial Measurement Unit (IMU)

IMUs measure acceleration and rotation, essential for understanding robot motion and orientation.

Components:

Accelerometer (3-axis)
- Measures linear acceleration
- Detects gravity direction
- Senses vibration and shock
Gyroscope (3-axis)
- Measures angular velocity
- Detects rotation rate
- High-frequency motion tracking
Magnetometer (3-axis, optional)
- Measures magnetic field
- Provides absolute heading
- Compass functionality

9-DOF IMU (Degrees of Freedom):

Sensor	Axes	Measurement
Accelerometer	X, Y, Z	Linear acceleration (m/s²)
Gyroscope	X, Y, Z	Angular velocity (rad/s)
Magnetometer	X, Y, Z	Magnetic field (μT)

Sensor Fusion:

IMUs typically combine all sensor data to estimate orientation:

Orientation = integrate(gyroscope) + correct_with(accelerometer + magnetometer)

Common IMU Models:

MPU-6050/9250: Low-cost, hobbyist projects
BNO055: Built-in sensor fusion
Bosch BMI088: High-performance automotive
LORD MicroStrain: Industrial-grade

Applications:

Balancing for bipedal robots
Drone stabilization
Gait analysis
Vehicle dynamics

# Example: Reading IMU data with ROS 2
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Imu
import math

class IMUMonitor(Node):
    def __init__(self):
        super().__init__('imu_monitor')
        self.subscription = self.create_subscription(
            Imu,
            '/imu/data',
            self.imu_callback,
            10)
    
    def imu_callback(self, msg):
        # Extract orientation (quaternion)
        qx = msg.orientation.x
        qy = msg.orientation.y
        qz = msg.orientation.z
        qw = msg.orientation.w
        
        # Convert to Euler angles (roll, pitch, yaw)
        roll = math.atan2(2*(qw*qx + qy*qz), 1 - 2*(qx**2 + qy**2))
        pitch = math.asin(2*(qw*qy - qz*qx))
        yaw = math.atan2(2*(qw*qz + qx*qy), 1 - 2*(qy**2 + qz**2))
        
        # Extract linear acceleration
        ax = msg.linear_acceleration.x
        ay = msg.linear_acceleration.y
        az = msg.linear_acceleration.z
        
        self.get_logger().info(
            f'Roll: {math.degrees(roll):.1f}° '
            f'Pitch: {math.degrees(pitch):.1f}° '
            f'Yaw: {math.degrees(yaw):.1f}°')

Wheel Encoders

Measure wheel rotation to estimate robot position and velocity.

Types:

Optical: LED and photodetector
Magnetic: Hall effect sensors
Capacitive: Capacitance changes

Key Specifications:

Resolution: Pulses per revolution (PPR)
- Low: 12-64 PPR
- Medium: 100-512 PPR
- High: 1024-4096 PPR
Accuracy: ±0.1° to ±0.01°

Odometry Calculation:

# Simple differential drive odometry
class Odometry:
    def __init__(self, wheel_radius, wheel_base):
        self.r = wheel_radius  # meters
        self.L = wheel_base    # meters
        self.x = 0.0
        self.y = 0.0
        self.theta = 0.0
    
    def update(self, left_ticks, right_ticks, ticks_per_rev):
        # Convert ticks to distance
        left_dist = (left_ticks / ticks_per_rev) * 2 * 3.14159 * self.r
        right_dist = (right_ticks / ticks_per_rev) * 2 * 3.14159 * self.r
        
        # Calculate center distance and rotation
        center_dist = (left_dist + right_dist) / 2
        d_theta = (right_dist - left_dist) / self.L
        
        # Update pose
        self.x += center_dist * math.cos(self.theta + d_theta/2)
        self.y += center_dist * math.sin(self.theta + d_theta/2)
        self.theta += d_theta

Force and Tactile Sensors

Force/Torque Sensors

Measure forces and moments applied to a robot, crucial for manipulation and contact interactions.

6-DOF Force/Torque Sensor:

Measurement	Symbol	Unit
Force X	Fx	Newton (N)
Force Y	Fy	Newton (N)
Force Z	Fz	Newton (N)
Torque X	Tx	Newton-meter (Nm)
Torque Y	Ty	Newton-meter (Nm)
Torque Z	Tz	Newton-meter (Nm)

Applications:

Precise object grasping
Assembly operations (peg-in-hole)
Human-robot collaboration (safety)
Walking robots (ground contact force)
Surgical robots (tissue interaction)

Example: ATI Mini40

Range: ±40N (Fx, Fy), ±120N (Fz)
Resolution: 1/50 N
Compact size (40mm diameter)

Tactile Sensors

Provide information about contact, pressure distribution, and texture.

Types:

Resistive: Pressure changes resistance
Capacitive: Pressure changes capacitance
Piezoelectric: Pressure generates voltage
Optical: Pressure deforms optical waveguides

Applications:

Robotic grippers (grasp stability)
Prosthetic hands (feedback)
Texture recognition
Slip detection

Example: BioTac:

Mimics human fingertip
19 sensors (pressure, temperature, vibration)
Can distinguish 117 materials with 95% accuracy

Sensor Fusion

Single sensors have limitations. Sensor fusion combines data from multiple sensors to create a more accurate and robust perception.

Why Sensor Fusion?

Individual Sensor Limitations:

Sensor	Strengths	Weaknesses
Camera	Rich detail, color	No depth, light-dependent
LiDAR	Accurate 3D	Expensive, no color
Radar	All-weather, velocity	Low resolution
IMU	High-rate motion	Drift over time
GPS	Absolute position	Poor indoors, low rate

Fusion Benefits:

✅ Compensates for individual weaknesses
✅ Increases reliability
✅ Provides redundancy
✅ Improves accuracy

Fusion Techniques

1. Kalman Filter

Optimal for linear systems with Gaussian noise.

Prediction:    x̂ₖ = Aₖxₖ₋₁ + Bₖuₖ
Update:        xₖ = x̂ₖ + Kₖ(zₖ - Hx̂ₖ)

2. Extended Kalman Filter (EKF)

Handles nonlinear systems (e.g., robot orientation).

3. Particle Filter

Uses multiple "particles" to represent probability distribution.

4. Deep Learning

Neural networks learn optimal fusion from data.

Example: Camera + LiDAR Fusion

# Simplified camera-LiDAR fusion for object detection
class SensorFusion:
    def __init__(self):
        self.camera_detections = []
        self.lidar_points = []
    
    def fuse(self):
        fused_objects = []
        
        for detection in self.camera_detections:
            # Get 2D bounding box from camera
            x1, y1, x2, y2 = detection['bbox']
            object_class = detection['class']
            
            # Project LiDAR points into camera frame
            points_in_box = self.get_lidar_in_bbox(x1, y1, x2, y2)
            
            # Calculate 3D position from LiDAR
            if len(points_in_box) > 0:
                position_3d = np.mean(points_in_box, axis=0)
                
                fused_objects.append({
                    'class': object_class,
                    'position': position_3d,
                    'confidence': detection['confidence']
                })
        
        return fused_objects

Perception Pipeline

A complete perception system processes raw sensor data through multiple stages to extract meaningful information.

Standard Pipeline:

1. Data Acquisition
   ↓
2. Preprocessing
   ↓
3. Feature Extraction
   ↓
4. Object Detection/Recognition
   ↓
5. Tracking
   ↓
6. Scene Understanding
   ↓
7. Decision Making

Stage Details:

1. Data Acquisition

Read sensor data at appropriate rates
Synchronize multi-sensor data
Buffer for processing

2. Preprocessing

Noise filtering
Calibration correction
Data format conversion

3. Feature Extraction

Edges, corners, keypoints (vision)
Planar surfaces (LiDAR)
Motion patterns (IMU)

4. Detection/Recognition

Object detection (YOLO, Faster R-CNN)
Semantic segmentation
Point cloud clustering

5. Tracking

Maintain object identity over time
Predict future positions
Handle occlusions

6. Scene Understanding

Build occupancy map
Classify drivable surfaces
Understand spatial relationships

7. Decision Making

Path planning
Behavior selection
Control commands

Challenges in Real-World Perception

Environmental Challenges

Lighting Variations:

Bright sunlight causes glare and washout
Darkness requires active illumination
Shadows create false obstacles

Weather Conditions:

Rain/snow affects optical sensors
Fog reduces visibility range
Ice/mud affects traction sensing

Dynamic Environments:

Moving obstacles (pedestrians, vehicles)
Changing layouts
Temporary obstructions

Technical Challenges

Computational Limits:

Real-time processing requirements
Limited onboard computing
Power constraints

Calibration Drift:

Sensors shift over time
Temperature effects
Mechanical wear

Sensor Failures:

Hardware malfunctions
Blocked sensors
Data corruption

Solutions:

Challenge	Solution
Lighting	Multi-modal sensing, HDR cameras
Weather	Radar + LiDAR fusion, thermal imaging
Computation	Edge AI accelerators, efficient algorithms
Failures	Redundancy, fault detection, graceful degradation

Practical Applications

Autonomous Vehicles

Sensor Suite:

📷 8+ cameras (360° coverage)
🔴 5+ LiDAR units
📡 6+ radar sensors
🧭 High-precision GPS + IMU
🌐 V2X communication

Perception Tasks:

Lane detection
Traffic sign recognition
Pedestrian detection
Vehicle tracking
Free space estimation

Warehouse Robots

Sensors:

2D LiDAR for navigation
Depth camera for shelf detection
Encoders for odometry
Bump sensors for safety

Tasks:

Autonomous navigation
Pallet detection
Collision avoidance
Localization

Humanoid Robots

Sensor Requirements:

Head-mounted cameras
Torso LiDAR/cameras
IMU for balance
Force sensors in feet
Tactile sensors in hands

Perception Needs:

Human detection and tracking
Object recognition and grasping
Terrain classification
Balance maintenance

Summary

Robot perception is the foundation of intelligent physical AI systems. By combining multiple sensor modalities—vision, depth, motion, and touch—robots can build comprehensive models of their environment and interact safely and effectively with the physical world.

Key Takeaways:

✅ Different sensors provide complementary information
✅ No single sensor is perfect—fusion is essential
✅ Real-time processing is critical for robot control
✅ Environmental challenges require robust solutions
✅ Perception enables all higher-level robot behaviors

Next Steps:

In the following chapters, we'll explore how robots use this perceptual information to make decisions, plan actions, and execute complex tasks in the physical world.

This chapter provided the foundation for understanding robot sensors and perception. The next chapter will dive into ROS 2 architecture, the middleware that connects sensors, algorithms, and actuators into complete robotic systems.

Table of Contents​

Introduction to Robot Perception​

Why Perception Matters​

The Perception Challenge​

Visual Sensors​

RGB Cameras​

Depth Cameras​

Stereo Cameras​

Structured Light Cameras​

Time-of-Flight (ToF) Cameras​

Thermal Cameras​

Range and Depth Sensors​

LiDAR (Light Detection and Ranging)​

Radar​

Ultrasonic Sensors​

Motion and Orientation Sensors​

Inertial Measurement Unit (IMU)​

Wheel Encoders​

Force and Tactile Sensors​

Force/Torque Sensors​

Tactile Sensors​

Sensor Fusion​

Why Sensor Fusion?​

Fusion Techniques​

Example: Camera + LiDAR Fusion​

Perception Pipeline​

Standard Pipeline:​

Stage Details:​

Challenges in Real-World Perception​

Environmental Challenges​

Technical Challenges​

Solutions:​

Practical Applications​

Autonomous Vehicles​

Warehouse Robots​

Humanoid Robots​

Summary​

Table of Contents

Introduction to Robot Perception

Why Perception Matters

The Perception Challenge

Visual Sensors

RGB Cameras

Depth Cameras

Stereo Cameras

Structured Light Cameras

Time-of-Flight (ToF) Cameras

Thermal Cameras

Range and Depth Sensors

LiDAR (Light Detection and Ranging)

Radar

Ultrasonic Sensors

Motion and Orientation Sensors

Inertial Measurement Unit (IMU)

Wheel Encoders

Force and Tactile Sensors

Force/Torque Sensors

Tactile Sensors

Sensor Fusion

Why Sensor Fusion?

Fusion Techniques

Example: Camera + LiDAR Fusion

Perception Pipeline

Standard Pipeline:

Stage Details:

Challenges in Real-World Perception

Environmental Challenges

Technical Challenges

Solutions:

Practical Applications

Autonomous Vehicles

Warehouse Robots

Humanoid Robots

Summary